[metrics] add block analyzer queuelength metrics #577

Andrew7234 · 2023-11-30T22:02:49Z

Task:

Expose 'queueLength' metric equivalent in block analyzers

For more context, see https://github.com/oasisprotocol/nexus/pull/511/files#r1318043060

It'd be nice to report the difference between the indexed block height and the node block height in prometheus. We already have both heights stored by the node_stats analyzer

This PR

To enable the queuelength metric for runtime block analyzers, nexus needs the current chain heights of the runtimes. The node_stats analyzer currently fetches/stores this data only for consensus, so this PR also adds runtime support for the node_stats analyzer.

Note: We could avoid this by fetching the chain heights directly in the block analyzers. Although the height would be fresher, it would also require a second round-trip in the main block processing loop. Since the metrics are internal-only I opted to use the chain heights we already store in chain.latest_node_heights. Open to suggestions though.

Alternatively alternatively, we could put the metric updating thing in a concurrent loop separate from the main block processing loop. Not sure if it's worth the extra complexity since slow-sync is, by name, expected to be slower.

~/oasis/oasis-block-indexer(andrew7234/block-analyzer-queuelength*) » curl localhost:8009/metrics | grep queue_length                 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43060    0 43060    0     0  7934k      0 --:--:-- --:--:-- --:--:-- 41.0M
# HELP consensus_queue_length number of blocks left to process for the consensus analyzer
# TYPE consensus_queue_length gauge
consensus_queue_length 0 
^ failed bc couldn't run consensus analyzer locally. Default value of a prometheus gauge is 0, which is what we see here

# HELP emerald_queue_length number of blocks left to process for the emerald analyzer
# TYPE emerald_queue_length gauge
emerald_queue_length 2.392834e+06
# HELP sapphire_queue_length number of blocks left to process for the sapphire analyzer
# TYPE sapphire_queue_length gauge
sapphire_queue_length 3.726054e+06

analyzer/block/block.go

mitjat

Thanks Andy!

analyzer/node_stats/node_stats.go

analyzer/block/block.go

mitjat · 2023-12-05T01:47:41Z

cmd/analyzer/analyzer.go

+			if err1 != nil {
+				return nil, err1
+			}
+			return nodestats.NewAnalyzer(cfg.Analyzers.NodeStats.ItemBasedAnalyzerConfig, sourceClient, emeraldClient, sapphireClient, dbClient, logger)


Thank you for expanding the node stats analyzer!

It's not great that we're hardcoding the dependency on all the runtimes here. Imagine Sapphire becomes unavailable and we still want stats for consensus; or that we're wanting to test nodestats locally or in (nonexistent :/) tests, but don't go through the hassle of setting up access to all runtimes.

Three options I can think of:

Add a config flag (list) for specifying which layers to include; this is probably the most "proper" solution, and quite usable too if the default is [consensus, sapphire, emerald].

Decide which layers to include based on the presence/absence of their block analyzers. That's very convenient but leads to implicit dependencies between analyzers / sections of config, which is super ugly.

Leave as-is. If e.g. sapphire becomes unavailable, things will continue to work (thanks to lazy initialization of node clients in Connect to oasis-node lazily #555), we'll just see a good amount of error spam in logs.

Your call on whether to go with 1 or 3. I think both are justifiable. I'm partially writing them out to check if option 3 holds the way I've described it.

Went with option 1!

Some notes for future reference:
The code here creates the consensus/runtime clients for all layers even if they weren't specified in the config file. The LazyGrpcConnect function mentioned above allows us to connect to oasis-node only when needed. However, when we instantiate a new runtime client we also make a connections.SDKConnect call a few lines above here, which seems like it would error. However, the underlying connection code only establishes the connection and checks the chainContext; it does not fetch any runtime-specific info. In almost all cases, the default rpc node specified in the config file will pass this check successfully. Thus the only failure case should be when a runtime node is explicitly specified and is down. In this case, the failure is immediate and obvious and can be circumvented by either a) restoring the node or b) removing the problematic layer from the node-stats config list.

mitjat

Thank you! The are one or two not-entirely trivial comment threads still unresolved, but I expect smooth sails from here on out. I'm LGTMing now as holidays will make it harder to sync.

It looks like you might also have to be careful with the rebase now that #572 is merged? Hopefully not 🤞

analyzer/block/block.go

analyzer/consensus/consensus.go

nit enable node stats analyzer for runtimes wip tweaks nit misc address comments address comments nit fix tests/nits lint

Andrew7234 changed the base branch from main to mitjat/3phase-timings November 30, 2023 22:03

Andrew7234 commented Nov 30, 2023

View reviewed changes

analyzer/block/block.go Outdated Show resolved Hide resolved

Andrew7234 force-pushed the andrew7234/block-analyzer-queuelength branch from 34865fa to 83c67c6 Compare December 1, 2023 23:04

Andrew7234 changed the title ~~Andrew7234/block analyzer queuelength~~ [metrics] add block analyzer queuelength metrics Dec 1, 2023

Andrew7234 marked this pull request as ready for review December 4, 2023 13:08

Andrew7234 requested review from mitjat, pro-wh and ptrus as code owners December 4, 2023 13:08

mitjat reviewed Dec 5, 2023

View reviewed changes

Andrew7234 force-pushed the andrew7234/block-analyzer-queuelength branch from 83c67c6 to 81ff26a Compare December 14, 2023 06:00

mitjat mentioned this pull request Dec 23, 2023

metrics: Add timings for data fetch, data analysis #572

Merged

mitjat force-pushed the mitjat/3phase-timings branch 2 times, most recently from 3a832f7 to 816e08a Compare December 23, 2023 00:18

Base automatically changed from mitjat/3phase-timings to main December 23, 2023 00:23

mitjat approved these changes Dec 23, 2023

View reviewed changes

analyzer/block/block.go Outdated Show resolved Hide resolved

analyzer/block/block.go Outdated Show resolved Hide resolved

analyzer/consensus/consensus.go Outdated Show resolved Hide resolved

Andrew7234 force-pushed the andrew7234/block-analyzer-queuelength branch 2 times, most recently from 48e2d9d to 15b3db5 Compare January 4, 2024 17:47

Andrew7234 force-pushed the andrew7234/block-analyzer-queuelength branch 2 times, most recently from 48e50fc to d83c474 Compare January 12, 2024 19:41

wip

cf37b92

nit enable node stats analyzer for runtimes wip tweaks nit misc address comments address comments nit fix tests/nits lint

Andrew7234 force-pushed the andrew7234/block-analyzer-queuelength branch from d83c474 to cf37b92 Compare January 17, 2024 21:27

Andrew7234 merged commit fc099d7 into main Jan 17, 2024
6 checks passed

Andrew7234 deleted the andrew7234/block-analyzer-queuelength branch January 17, 2024 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metrics] add block analyzer queuelength metrics #577

[metrics] add block analyzer queuelength metrics #577

Andrew7234 commented Nov 30, 2023 •

edited by csillag

Loading

mitjat left a comment

mitjat Dec 5, 2023

Andrew7234 Jan 11, 2024

mitjat left a comment

[metrics] add block analyzer queuelength metrics #577

[metrics] add block analyzer queuelength metrics #577

Conversation

Andrew7234 commented Nov 30, 2023 • edited by csillag Loading

Task:

This PR

mitjat left a comment

Choose a reason for hiding this comment

mitjat Dec 5, 2023

Choose a reason for hiding this comment

Andrew7234 Jan 11, 2024

Choose a reason for hiding this comment

mitjat left a comment

Choose a reason for hiding this comment

Andrew7234 commented Nov 30, 2023 •

edited by csillag

Loading